"""
Polynomial Chaos Expansion example: Active Learning for Multiple Surrogate Models
======================================================================================

In this example, we use active learning for construction of optimal experimental design with respect to exploration of
the design domain and exploitation of given surrogate models in form of Polynomial Chaos Expansion (PCE). Active learning is based on :math:`\Theta` criterion recently proposed in

L. Novák, M. Vořechovský, V. Sadílek, M. D. Shields, *Variance-based adaptive sequential sampling for polynomial chaos expansion*, 637 Computer Methods in Applied Mechanics and Engineering 386 (2021) 114105. doi:10.1016/j.cma.2021.114105.
"""

# %% md
# We start with the necessary imports.

# %%

import numpy as np
import matplotlib.pyplot as plt
from UQpy.sampling.ThetaCriterionPCE import *

from UQpy.distributions import Uniform, JointIndependent, Normal
from UQpy.surrogates import *

from UQpy.sampling import LatinHypercubeSampling
from UQpy.sampling.stratified_sampling.latin_hypercube_criteria import *


# %% md
# The example involves a :math:`2D` function with mirrored quarter-circle arc line singularities. The form of the function is give by:
#
# .. math:: f(\mathbf{X})=  \frac{1}{ \lvert 0.3-X_1^2 - X_2^2\rvert + \delta}- \frac{1}{ \lvert 0.3-(1-X_1)^2 - (1-X_2)^2\rvert + \delta}, \quad \mathbf{X} \sim \mathcal{U}[0,1]^2
#
# where the strength of the singularities is controlled by the parameter :math:`\delta`, which we set as :math:`\delta=0.01`.
# 
# 

# %%


def Model2DComplete(X, delta=0.01):
    Y = Model2D1(X) + Model2D2(X)
    return Y


# %% md
# In order to show the possibilities of active learning for multiple surrogate models, we split the function into the two parts as follows:
# 
# .. math:: f_1(\mathbf{X})= \begin{cases} \frac{1}{ \lvert 0.3-X_1^2 - X_2^2\rvert + \delta}-\frac{1}{ \lvert 0.3-(1-X_1)^2 - (1-X_2)^2\rvert + \delta} \quad \text{for} \quad X_1<X_2\\ 0 \quad \text{otherwise} \end{cases}
# 
# and
# 
# .. math:: f_2(\mathbf{X})= \begin{cases} \frac{1}{ \lvert 0.3-X_1^2 - X_2^2\rvert + \delta}-\frac{1}{ \lvert 0.3-(1-X_1)^2 - (1-X_2)^2\rvert + \delta} \quad \text{for} \quad X_1>X_2\\ 0 \quad \text{otherwise} \end{cases}

# %%


def Model2D1(X, delta=0.01):
    M = X[:, 0] < X[:, 1]
    Y = 1 / (np.abs(0.3 - X[:, 0] ** 2 - X[:, 1] ** 2) + delta) - 1 / (
                np.abs(0.3 - (1 - X[:, 0]) ** 2 - (1 - X[:, 1]) ** 2) + delta)
    Y[M] = 0
    return Y


def Model2D2(X, delta=0.01):
    M = X[:, 0] > X[:, 1]
    Y = 1 / (np.abs(0.3 - X[:, 0] ** 2 - X[:, 1] ** 2) + delta) - 1 / (
                np.abs(0.3 - (1 - X[:, 0]) ** 2 - (1 - X[:, 1]) ** 2) + delta)
    Y[M] = 0
    return Y

# %% md
# The mathematical models have independent random inputs, which are uniformly distributed in interval :math:`[0, 1]`.

# %%


# input distributions
dist1 = Uniform(loc=0, scale=1)
dist2 = Uniform(loc=0, scale=1)

marg = [dist1, dist2]
joint = JointIndependent(marginals=marg)

# %% md
# We must now select a polynomial basis. Here we opt for a total-degree (TD) basis, such that the univariate polynomials have a maximum degree equal to :math:`P` and all multivariate polynomial have a total-degree (sum of degrees of corresponding univariate polynomials) at most equal to :math:`P`. The size of the basis is then given by
#
# .. math:: \frac{(N+P)!}{N! P!}
#
# where :math:`N` is the number of random inputs (here, :math:`N=2`).
#

# %%


# realizations of random inputs
# training data
# maximum polynomial degree
P = 10
# construct total-degree polynomial basis and use OLS for estimation of coefficients
polynomial_basis = TotalDegreeBasis(joint, P)

# %% md
# We must now compute the PCE coefficients. For that we first need a training sample of input random variable realizations and the corresponding model outputs. These two data sets form what is also known as an ''experimental design''. In case of adaptive construction of PCE by the best model selection algorithm, size of ED is given apriori and the most suitable basis functions are adaptively selected. Here we have two surrogate models with identical training samples of input random vector and two sets of corresponding mathematical models. This ED represents small initial ED, which will be further extended by active learning algorithm.

# %%


# number of inital traning samples
sample_size = 15

# realizations of input random vector
xx_train = joint.rvs(sample_size)

# corresponding model outputs
yy_train1 = Model2D1(xx_train)
yy_train2 = Model2D2(xx_train)

# %% md
# We now fit the PCE coefficients by solving a regression problem. Here we opt for the :code:`np.linalg.lstsq` method, which is based on the _dgelsd_ solver of LAPACK. This original PCE class will be used for further selection of the best basis functions. Once we have created the PCE containing all basis functions generated by TD algorithm, it is possible to reduce the number of basis functions by LAR algorithm. We create sparse PCE approximations for both mathematical models as follows.

# %%


least_squares = LeastSquareRegression()

# fit model 1
pce1 = PolynomialChaosExpansion(polynomial_basis=polynomial_basis, regression_method=least_squares)
pce1.fit(xx_train, yy_train1)
pceLAR1 = polynomial_chaos.regressions.LeastAngleRegression.model_selection(pce1)

# fit model 2
pce2 = PolynomialChaosExpansion(polynomial_basis=polynomial_basis, regression_method=least_squares)
pce2.fit(xx_train, yy_train2)
pceLAR2 = polynomial_chaos.regressions.LeastAngleRegression.model_selection(pce2)

# %% md
# The active learning algorithm based on $\Theta$ criterion selects the best sample from given large set of candidates coverign uniformly the whole design domain. Here we set number of samples as :math:`n_{cand}=10^4` and use LHS-MaxiMin for sampling, though any sampling technique can be employed.

# %%


# number of candidates for the active learning algorithm
n_cand = 10000

# MaxiMin LHS samples uniformly covering the whole input random space
lhs_maximin_cand = LatinHypercubeSampling(distributions=[dist1, dist2],
                                          criterion=MaxiMin(metric=DistanceMetric.CHEBYSHEV),
                                          nsamples=n_cand)

# candidates for active learning
Xaptive = lhs_maximin_cand._samples

# %% md
# Start of the active learning algorithm interatively selecting :math:`nsamples=400` samples one-by-one. Note that the class :code:`ThetaCriterionPCE` takes a list of surrogate models as input, here we have 2 PCEs approximated 2 mathematical models. The active learning further selects the best candidate in each run by variance-based :math:`\Theta` criterion.

# %%


# total number of added points by the active learning algorithm
naddedsims = 400

# loading of existing ED for both PCEs
Xadapted = xx_train
Yadapted1 = yy_train1
Yadapted2 = yy_train2

# adaptive algorithm and reconstruction of PCE

for i in range(0, int(naddedsims)):
    # create ThetaCriterion class for active learning
    ThetaSampling = ThetaCriterionPCE([pceLAR1, pceLAR2])

    # find the best candidate according to given criterium (variance and distance)
    pos = ThetaSampling.run(Xadapted, Xaptive)
    newpointX = np.array([Xaptive[pos, :]])

    newpointres1 = Model2D1(newpointX)
    newpointres2 = Model2D2(newpointX)

    # add the best candidate to experimental design
    Xadapted = np.append(Xadapted, newpointX, axis=0)
    Yadapted1 = np.r_[Yadapted1, newpointres1]
    Yadapted2 = np.r_[Yadapted2, newpointres2]

    # reconstruct the PCE 1
    pce1.fit(Xadapted, Yadapted1)
    pceLAR1 = polynomial_chaos.regressions.LeastAngleRegression.model_selection(pce1)

    # reconstruct the PCE 2
    pce2.fit(Xadapted, Yadapted2)
    pceLAR2 = polynomial_chaos.regressions.LeastAngleRegression.model_selection(pce2)

    if i % 10 == 0:
        print('\nNumber of added simulations:', i)

# plot final ED
fig, ax_nstd = plt.subplots(figsize=(6, 6))
ax_nstd.plot(Xadapted[:, 0], Xadapted[:, 1], 'ro', label='Adapted ED')
ax_nstd.plot(xx_train[:, 0], xx_train[:, 1], 'bo', label='Original ED')
ax_nstd.set_ylabel(r'$X_2$')
ax_nstd.set_xlabel(r'$X_1$')
ax_nstd.legend(loc='upper left');

# %% md
# For a comparison, we construct also a surrogate model of the full original mathematical model and further run active learning similarly as for the previous reduced models. Note that the final ED for this complete mathematical model should be almost identical as in the previous case with the two PCEs approximating reduced models.

# %%


yy_train3 = Model2DComplete(xx_train)
pce3 = PolynomialChaosExpansion(polynomial_basis=polynomial_basis, regression_method=least_squares)
pce3.fit(xx_train, yy_train3)
pceLAR3 = polynomial_chaos.regressions.LeastAngleRegression.model_selection(pce3)



Xadapted3 = xx_train
Yadapted3 = yy_train3

# adaptive algorithm and reconstruction of PCE
for i in range(0, int(400)):
    # create ThetaCriterion class for active learning
    ThetaSampling_complete = ThetaCriterionPCE([pceLAR3])

    # find the best candidate according to given criterium (variance and distance)
    pos = ThetaSampling_complete.run(Xadapted3, Xaptive)
    newpointX = np.array([Xaptive[pos, :]])
    newpointres = Model2DComplete(newpointX)

    # add the best candidate to experimental design
    Xadapted3 = np.append(Xadapted3, newpointX, axis=0)
    Yadapted3 = np.r_[Yadapted3, newpointres]

    pce3.fit(Xadapted3, Yadapted3)
    pceLAR3 = polynomial_chaos.regressions.LeastAngleRegression.model_selection(pce3)

    if i % 10 == 0:
        print('\nNumber of added simulations:', i)

# plot final ED
fig, ax_nstd = plt.subplots(figsize=(6, 6))
ax_nstd.plot(Xadapted3[:, 0], Xadapted3[:, 1], 'ro', label='Adapted ED')
ax_nstd.plot(xx_train[:, 0], xx_train[:, 1], 'bo', label='Original ED')
ax_nstd.set_ylabel(r'$X_2$')
ax_nstd.set_xlabel(r'$X_1$')
ax_nstd.legend(loc='upper left');
